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Abstract 

We present a top-down approach to the study of the dynamics of icosahedral virus cap- 
sids, in which each protein is approximated by a point mass. Although this represents a 
rather crude coarse-graining, we argue that it highlights several generic features of vibra- 
tional spectra which have been overlooked so far. We furthermore discuss the consequences 
of approximate inversion symmetry as well as the role played by Viral Tiling Theory in the 
study of virus capsid vibrations. 



1 Introduction 

It has been experimentally observed that viruses can alter their shape to fulfill specific func- 
tions. In particular, they may swell during maturation [H-0], twist to release their genetic 
material during infection, or morph during assembly. Such large scale conformational changes 
are consistent with the widespread hypothesis that viruses do vibrate, and it is therefore of 
interest to study their dynamics with the help of mathematical and computational techniques 
which have been tried and tested in the context of biomacromolecule vibrations (see [3] for a 
review) . 

Normal mode analysis is one such method 0-11], which has been successfully applied to 



the study of proteins and a variety of viruses to date 0Q A major challenge is the huge 
number of degrees of freedom involved in such systems. Several degrees of coarse- graining, as 
well as group theoretical methods (inspired by their successful application to small molecules 
and fullerenes 3-1^]), have been implemented in computer simulations in order to extract in- 
formation on the low-frequency modes of vibration which are thought to be relevant for protein 
and virus function fl^. 20 1 . Although such theoretical data become increasingly available for 



icosahedral systems 1 211 ] thanks to advances in computer power, a clear and insightful 

vibrational pattern across icosahedral viruses has not emerged yet. The art of coarse-graining is 
a delicate one, as it is often argued that excessive coarse-graining produces a dynamical picture 
that has little to do with reality. We actually need a hierarchy of coarse-grained calculations, 
which hopefully reveal complementary aspects of the dynamical jigsaw. 
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We argue here that even the crudest approximation, where each capsid protein is treated 
as a point mass located at its centre of mass, is helpful in highlighting dynamical features that 
are present in more sophisticated normal mode analyses, but have been overlooked so far. Our 
initial mathematical motivation was to assess to which extent Viral Tiling Theory, a recently 
proposed model for icosahedral viral capsids which solves a classification puzzle in the Caspar- 



Klug nomenclature [22j, l23j], provides a new insight in the dynamics of viruses. In particular, 
we ask whether there is a correlation between the vibrational patterns of viruses with a given 
number of coat proteins and their viral tiling. 

The paper is organised as follows. In Section 2, we briefly describe Viral Tiling Theory in 
the context of the viral capsids RYMV (T = 3), HK97 (T = 71) and SV40 (pseudo T = Id), 
with emphasis on how the underlying icosahedral symmetry manifests itself in different subtle 
ways for these three cases. In particular, it has implications for the group theoretical analysis 
of normal modes of vibrations. An expanded version of these remarks, applicable to viruses 



and phages of all T numbers, is available in 24|. Section 3 provides a simple normal mode 



analysis for the three capsids above, where group theoretical techniques reminiscent of those 
used in calculations of vibrational modes of small molecules are implemented. This paves 
the way for the more extensive study performed in [2H], which reveals an intriguing universal 
pattern of low frequency normal modes. We conclude with some open questions prompted by 
our investigations. 

2 Tilings of Rice Yellow Mottle, Hong-Kong 97 and Simian 
Virus 40 

Viral tiling theory provides an elegant way of encoding the icosahedral symmetry of viral 
capsids by keeping track of the location of coat proteins and the orientation of capsomers on 
the viral shell, while also keying in the dominant^ bond structure between those proteins. 

The Rice Yellow Mottle Virus (RYMV) belon gs to the Sobemovirus genus. It is classified as 
a T = 3 virus in the Caspar-Klug labelling system [261 ]. and its icosahedral capsid accommodates 
180 coat proteins or subunits which are clustered in 12 pentamers around the 5-fold axes and 
20 hexamers about the 3-fold global symmetry axes of the icosahedron. The location of the 
proteins are consistent with a triangular tiling a la Caspar-Klug, and each triangular tile encodes 
trimer interactions between coat proteins, as represented in Fig. [U The HK97 bacteriophage 
on the other hand has a T = 71 capsid made of 420 proteins arranged in 12 pentamers and 
60 hexamers, with four types of dimer interactions modelled by rhomb prototiles; see Fig. [2j 
The SV40 virus is a member of the Polyomaviridae family and has a pseudo T = 7d capsid 
which accommodates 360 coat proteins organised in pentamers through two types of spherical 
prototiles, namely rhombs, encoding two types of dimer interactions, and kites encoding trimer 



interactions, as represented in Figure 2 of reference [27|. SV40 is an example of an all-pentamer 
capsid, for which the Caspar-Klug classification is not applicable, and whose symmetries are 
captured by Viral Tiling Theory. 

In order to extract qualitative features of vibrational patterns from viral capsids, we restrict 
ourselves to a coarse-grained approximation where each capsid protein is replaced by a point 
mass whose location coincides with the centre of mass of the protein considered. This centre of 
mass is calculated by taking into consideration all crystallographically identified atoms of the 
protein, according to data stored in the Protein Data Bank or equivalently the VIPER website. 
We then assess how much deviation there is between the above distribution of point masses 
and a theoretical distribution exhibiting a centre of inversion. On the basis of the experimental 



In some cases, there exist stronger bonds between proteins pertaining to different tiles; RYMV is an example. 
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Figure 1: In the above picture, the location of the T = 3 RYMV capsid proteins coincide with the 
location of the centre of mass of each of them, calculated from the experimental data collected in 
the file lf2n.vdb. A triangular tiling (dashed lines) is superimposed on the icosahedral structure, 
and the colour coding is faithful to that of the VIPER website: the A chain is blue, the B chain, 
red and the C chain, green. The grey shaded triangular prototiles highlight trimer interactions 
between capsid proteins, while the blue shaded region corresponds to the fundamental domain 
of the proper rotation subgroup X of the full icosahedral group H% . 



data available to us, we argued in [2J] that the SV40 capsid has an approximate centre of 
inversion, while RYMV @ and HK97 do not. This has subtle consequences for the group- 
theoretical properties of normal modes of vibrations: when the capsid exhibits an effective 
centre of inversion, the group involved is the full icosahedral group H3 with 120 elements 
(usually called X^ in the science literature), while it is reduced to its subgroup X of 60 proper 
rotations in the absence of a centre of inversion. 

A viral capsid with N 'point mass' coat proteins has 3iV degrees of freedom, and hence 3iV 
modes of vibrations, of which 6 are associated with 3 rotations and 3 translations of the capsid 
as a whole. These are therefore not genuine normal modes of vibration. Group theory accounts 
for the degeneracies of these vibrational modes, and provides a mean to organize the normal 
mode spectrum of a given capsid [2a j. A key ingredient in this exercise is the construction of 
the displacement representation of the given capsid, which is a reducible representation of H3 
or X according to whether the distribution of capsid proteins exhibits a centre of inversion or 
not. Such representation consists of 120 (resp. 60) matrices r^ pl (g), g G H3 (resp.T) of size 
3A^ x 3iV, which encode how proteins are interchanged under the action of each element g, 
as well as how the displacements of each protein from the equilibrium position are rotated 
under the action of g. The latter information is gathered in 3 x 3 rotation matrices R{g) which 
form an irreducible representation of H3 (resp. I), while the former is encoded in permutation 
matrices P(g) of size N x TV, so that we have 

r^ pl ( 5 ) = P(g) ® R(g), V<? € H 3 (resp. J). (2.1) 

The permutation matrices P{g) act on vectors whose components are the vector positions 
fj°,i = 1, ..,N of the N proteins at equilibrium. The entry Pij(g) of the permutation matrix 
is 1 if r® is mapped on r,® by g, and is zero otherwise. 

Once the displacement representation is constructed, it remains to invoke the well-known 
property that it can be written in block diagonal form with the help of a (3iV x 3N) matrix 



The argument for RYMV is similar to the argument given for TBSV in [24 
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Figure 2: The location of the T = 7£ HK97 capsid proteins coincide with the location of 
the centre of mass of each of them, calculated from the experimental data collected in the file 
2fte.vdb. A rhomb tiling (dashed lines) is superimposed on the icosahedral structure, and the 
colour coding is faithful to that of the VIPER website: A chain (blue), B chain (red), C chain 
(green), D chain (yellow), E chain (cyan), F chain (purple) and G chain (pink). The rhomb 
prototiles highlight four types of dimer interactions between capsid proteins, while the blue 
shaded region corresponds to the fundamental domain of the proper rotation subgroup Z of the 
full icosahedral group H% . 

U with 

UYff{g)U- x = Tff\g) = ® p n p ^(g), (2.2) 
where the multiplicities n p are obtained via the following character formula 

«- = ^ X displ (5)* X P (9) or n p = — L- V X P {g)- (2.3) 



dim ^— ' dim 2 ^— ' 

9&H 3 gel 



The characters x p (d) of irreducible representations of the icosahedral group can be found in [241 ]. 
while the characters of the displacement representations X dlspl G?) are obtained by inspection 
of the displacement representation considered. Note that, in view of the very definition of the 
permutation matrices P(g) given in the previous subsection, and the fact that the characters 
of a representation are the traces of its constituent matrices, one has 



X 



disp \g) = Tr(P(g))Tr(R(g)) = ±(number of proteins unmoved by g) ■ (1 + 2 cos 9), (2.4) 



where 9 is the angle of the proper rotation associated with g, and the minus sign is taken when 
g € H3 \ I. So x dlspl (<?) i s zero when 9 = ^ or whenever g is such that no protein of a given 
capsid is kept fixed under its action. 

The decomposition of the displacement representation of a given capsid boils down to the 
knowledge of the coefficients n p in (|2.3[) which, in view of the expression (|2.4p . are non zero 
whenever at least one capsid protein is unmoved under the action of an element g (and 9 ^ ^). 

It can be shown that distributions of capsid proteins with no centre of inversion are such 
that the only group element which keeps any 'point mass' protein unmoved is the identity 
element g = e (and under its action, all iV proteins are obviously fixed). The second expression 



J The explicit form of the matrix U is not needed at this stage, but rather when the force matrix is partially 
diagonalised to obtain the frequencies of vibration. 
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in (|2.3p thus yields 

n p = ^X displ (e)* X p {e) = ^ X p (e), (2.5) 

where we used dimX = 60 and x dlspl ( e ) = 3iV (taking the plus sign and 9 = in (12. 4jl ). 
Recalling that x p ( e ) = Pi we arrive at the following decomposition formula, 

rff = ^ {r\ + srl + 3r^ + 44 + 51* } . (2.6) 

The number N of capsid proteins is always a multiple of sixty, N = 60k. In the many cases 
where the proteins are organised in 12 pentamers and a number of hexamers, k is the T-number 
of the Caspar-Klug nomenclature. Then, the number of non-degenerate normal modes in the 
singlet (symmetric) representation T]_ is 3T, while the number of p-fold degenerate normal 
modes (corresponding to the p-dimensional representation r+) is 3p 2 T, for p = 3, 4 and 5. In 
particular, N = 180 for RYMV, and the displacement representation decomposes into 

r 5?0 P RYMV = 9r + + 27r + + 27I t + 36r4 + + 45r +- ( 2 - 7 ) 

The 6 non-genuine modes belong to two copies of the r+ irreducible representation. There are 
nine non-degenerate and forty-five 5-fold degenerate Raman active modes, as well as twenty-five 
3-fold degenerate infrared active modes. 

Since N = 420 for HK97, the displacement representation decomposes into 

r So,HK97 = 2ir + + 63lt + 631^ + 84r| + 1051%, (2.8) 

and by the same argument as above, one arrives at twenty-one non-degenerate and one hundred 
and five 5-fold degenerate Raman active modes, as well as sixty-one 3- fold degenerate infrared 
active modes. 

The normal modes of the SV40 capsid would be organised according to the decomposition 
(j2.6j) with iV = 360 if we were not taking into account that the protein distribution on the 
capsid exhibits an approximate centre of inversion. We would have 

r i08o'sV40 = 18T l + 54r + + 54r + + 72T l + 90r +- ( 2 - 9 ) 

Instead, we use the first expression in f|2.3j) and note that the distribution of 'point-mass' 
proteins on the capsid is such that, besides the identity element g = e in H3 which leaves all N 

a) 

proteins unmoved, the fifteen rotations g\ , i = 1, .., 15 about the 2-fold axes of the icosahedron, 

(i) 

when combined with the inversion go, produce 15 further elements 5ofl , 2 w hich altogether 
leave 24 capsid proteins unmoved. Those fifteen group elements are in the same conjugacy 
class and therefore have the same character X p {9o9^P) = 1 ^ or V = 1 ? 5, x p {9o92 > ) = — 1 f° r 
p = 3, 3' and X 4 (<?o<?2^) = 0- Taking into account that for these group elements, Tr(R(gog2^)) = 
— (l + 2cos7r) = 1, we arrive at the following decomposition of the displacement representation, 

r io S 8 P o,SV40 = 12r + + 24r + + 24r^ +361^ +481^ + 6ri +30ri + 30rt' +36ri +42^. (2.10) 

The six non-genuine modes of vibrations are confined to one copy of the 3-dimensional irre- 
ducible representation and one copy of the 3-dimensional irreducible representation T 3 ^. 
There are twelve non-degenerate and forty-eight 5-fold degenerate Raman active modes, and 
fifty-two 3-fold degenerate infrared active modes . 
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3 Low frequency modes 



Our calculation of the low frequency normal modes is based on a spring-mass model, where the 
N 'point-mass' proteins are connected by a network of elastic forces described by a harmonic 
potential which is manifestly rotation and translation invariant, 



V 



N 1 



-0 1\2 



(3.11) 



The associated force matrix or 'Hessian' is given by 



d 2 V 

dr l m dr 3 n 



X=XQ 



(Tr, 



(» 



r=rg 



r=ro 



if m = n, 



otherwise. 



(3.12) 



In the above formulae, the vector refers to the equilibrium position of protein m and the 
vector f m of components r^, to its position after elastic displacement, all vectors originating 
at the centre of the capsid. The masses of the proteins are all set to unity (reflecting the fact 
that the various protein chains in a capsid have masses which are too good approximation 
identical), and K mn is the spring constant of the spring connecting protein m to protein n. 
The set of non-zero spring constants we choose, i.e. the topology of the elastic network we 
adopt, is dictated by the information derived from the association energies listed in VIPER for 
RYMV (lf2n.vdb), HK97 (2fte.vdb) and SV40 (lsva.vdb). Fig. H encodes the bonds provided 
by VIPER before acting on them with the icosahedral group in order to generate the complete 
spring network. 




Figure 3: Inter-protein bonds are given for (a) RYMV, (b) HK97 and (c) SV40. The relative 
strengths are represented by line segments of varying thickness, from the strongest bonds (thick 
lines) to the weakest (thin lines). The above diagrammes should be read in conjunction with 
the figures of Section 2. 

We have used the relative values of these energies, and therefore we are left with one 
parameter k in the force matrix, which sets the overall scale of the vibration frequencies. We 
are not aware of any experimental measurements of association energies between capsid proteins 
for the viruses and phages we are considering, and the absolute theoretical values calculated in 
1291 1 must be taken with extreme caution. 



G 



The force matrices Fm n we consider here have size 3iV x 3N with TV" = 180 for RYMV, 
N = 420 for HK97 and N = 360 for SV40. Although computers can handle a brute force 
diagonalisation of such matrices, and provide eigenvalues which are the square of the sought 
frequencies of vibration of normal modes, a group theoretical approach reduces considerably 
the size of the matrices to be diagonalized and above all, yields information on the distribution 
of normal modes within irreducible representations of the icosahedral group. This proves to be 
useful in an analysis of universal features of such vibrations. 

We have calculated the lowest frequency modes of vibration for the RYMV, HK97 and SV40 
capsids using well-known group theoretical methods. The association energies listed in Viper for 
RYMV allow for a stable capsid. Crucial to the stability are the C-arms linking together distant 
proteins of the C chain in Fig. [3^. The spectrum of the first 40 low frequency modes is presented 
in Fig. 0k- Apart from the six zero modes associated with the rotations and translations of the 
capsid as a whole, and which belong to two copies of the irreducible representation of I, 
one notices a cluster of 24 normal modes of very low and similar frequencies organized in a sum 
of irreducible representations according to + + + + + . This low plateau is 
disrupted by a significant jump in wave number. 



(a) 



(lf2n) spectrum (lowest 10 mode; 



Hong-Kong ' 97 spei 



(b) 



40 modes) 

►►►►►►»»»• 



Figure 4: Spectrum of low frequency normal modes for the RYMV capsid (a) and the HK97 
capsid (b). The triangular- shape modes > belong to 3-dimensional irreducible representations 
r+ of the icosahedral group, while the triangular- shape modes A belong to 3-dimensional ir- 
reducible representations T+. Accordingly, the diamond-shape modes belong to 4- dimensional 
irreducible representations and the pentagon- shape modes to 5-dimensional irreducible repre- 
sentations. The x-axis labels the normal modes while the y-axis gives the wave numbers in 
cm -1 (up to an overall normalisation which cannot be fixed from Viper data). 

A similar analysis was performed for HK97. The association energies listed in Viper for 
HK97 allow for a nearly-stable capsid, with nine strictly zero modes instead of the six expected. 
The 21 subsequent modes have similar frequencies, as can be seen from Fig. 0b. They are 
organized in the following sum of irreducible representations of I: + ri + V 5 , + + . If 
the spurious triplet of zero modes were lifted by the addition of extra bonds in the spring-mass 
model of HK97, one would again observe a cluster of 24 low frequency normal modes forming a 
plateau disrupted by a jump of the same scale as that appearing in RYMV. We have observed 



this phenomenon in a large number of viral capsids, and we will detail our findings in 251 ] . 

The SV40 case is particularly interesting because it does not quite fit with the above 
observations. As mentioned in Section 2, the viral capsid has a near centre of inversion, 
and one might want to explore the implications of treating the normal mode analysis with a 
symmetry-corrected 'point-mass' protein distribution. This, however, destabilizes the capsid, 
as the vertices of some triangular cells in the network become collinear. We will therefore 
refrain from considering a capsid with a centre of inversion, and perform the normal mode 
analysis as in the two previous cases (RYMV and HK97). Once more, we have plotted the low 
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frequency spectrum in Fig. 



Symian Virus 40 (lsva) spectrum (lowest 40 modes) 

• • • • 



0.00008 - 



Figure 5: Low frequency normal modes for the SV40 capsid. Symbol conventions as in Fig. 

Apart from the six zero modes associated with the rotations and translations of the capsid 
as a whole, one could argue that the next 23 modes should be considered as a cluster since 
their frequencies are very similar. However the plateau in this case is not disrupted by a 
spectacular jump in frequency, as the seven subsequent frequencies are roughly 1.6 larger than 
the first 23 non-zero modes. Early comparison with Murine Polyomavirus vibrational patterns 
does not shed light on the significance of these all-pentamer viral capsids spectra, and more 
investigations are needed. 



4 Conclusion 

We have discussed the vibration spectrum of icosahedral virus capsids, obtained from a coarse- 
grained model in which protein chains and their interactions are replaced by a spring-mass 
model. The goal of this programme is to understand, in a top-down approach, how properties 
of the capsid structure, such as an approximate inversion symmetry or a particular tiling type, 
reflect themselves in the vibrational spectrum. We believe this a useful complement to existing 
bottom-up approaches, which are rooted in all-atom computations. 

A comparison of our results with the spectra obtained in earlier all-atom computations 
reveals some interesting similarities. The most striking one is the existence of a low-frequency 
plateau of 24 modes, separated by a rather large gap from the remainder of the spectrum. This 
plateau is present for RYM as well as HK'97 and a large number of other virus capsids. It 
has been seen before in isolated examples [l3|, 14, 30], but the simplicity of our model offers 



a better chance to understand the general reason behind its existence (more details will be 
provided in [iH]). 

While Viral Tiling Theory provides a beautiful classification of the structure of virus capsids, 
its role in understanding the vibration spectra is at present less clear. Besides the bonds which 
bind together proteins on the same tile, many other bonds are required in order to obtain a 
stable capsid. These other, inter-tile bonds are often of a similar strength as the bonds on a 
single tile. In fact, it is an interesting mathematical problem to understand the best network 
topology (in terms of the optimal number of bonds) required for stability of a capsid. 

The present analysis focuses exclusively on the viral capsid, ignoring in particular the 
interaction of the virion with its environment and the presence of matter within the shell, 
which are undoubtedly worth considering in more elaborated models. Large-scale simulations 



have revealed that some virus capsids are unstable without RNA content 21j |. It would be 
interesting to understand this instability, as well as the effect of RNA content, for larger classes 
of capsids. 
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